法国专利FR3021432A1 PROCESSOR WITH CONDITIONAL INSTRUCTIONS

专利PDF首页>>法国专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
Method of processing machine instructions by a processor comprising the following steps: receiving a machine instruction to be executed, said machine instruction comprising the identification of at least a first operation to be executed and a conditional prefix representing a condition to be executed. verify to execute said at least one first operation, - evaluation of said prefix, and - execution or not of said at least one first operation identified in said machine instruction, according to whether said condition is verified or not.
公开号:FR3021432A1
申请号:FR1454511
申请日:2014-05-20
公开日:2015-11-27
发明作者:Ghassan Chehaibar
申请人:Bull SA；
IPC主号:

专利说明:

[0001] The present invention relates to the field of processors for computing devices. It particularly concerns processors dedicated to the implementation of the search for elements in lists. Certain types of computing devices, such as network cards supporting the MPI interface (acronym for "Message Passing Interface" in English), require the massive implementation of searching for items in lists. For example, it is a question of listing the messages expected by a node of a communication network with an indication of their respective storage locations and of comparing all the incoming messages of the node to those of the list. Thus, when a message arrives, it can be routed to its place of storage to be processed. Conventionally, each incoming message has a label that must be compared to the message label in the list. The labels of the messages of the list are possibly masked so that the comparison of the labels is done on a reduced number of bits. When a message arrives, its label is compared to that of the first item in the list, then the second, then the third, and so on, until a label match is found. In this case, the incoming message is routed to the storage location and the corresponding item in the list is deleted. The list is then updated. The list of expected messages is therefore a dynamically modified list with items that can be removed, when a corresponding message arrives or can be added when a new message is expected. The implementation of this type of search requires the execution of algorithms for routing and management of complex lists. In addition, these algorithms are generally implemented with a large number of options to manage. Thus, it is generally provided within computer devices, in particular MPI type interfaces, a processor dedicated to this type of operation. The use of a dedicated processor makes it possible to manage the search for a list element (or "matching" in English terminology) in a software and non-hardware way. This offers greater flexibility because the computer code driving the processor (or "micro-code" or "firmware" in English terminology) may change depending on changes in the interface specification for example. In order to obtain good performance of the processor, it is necessary to reduce its execution time and therefore its operating cycles. Indeed, the execution time of the processes in the processor impacts the flow of the messages managed by the interface.
[0002] Moreover, it is advisable to facilitate the writing of "firmwares" by the developers. Indeed, it is written in assembly language and therefore does not benefit from the high-level control structures offered by other types of language. An assembly code write error can have serious and direct consequences on the processor, without this error being controlled. Also, it may be desirable to keep a reasonable size for the machine instructions executed by the processor. Hemmert et al. An Architecture to Perform NIC Based MPI Matching discloses a predicate-based processor for controlling the flow of executed machine instructions. The machine instructions are executed based on values stored in predicate registers that store logical combinations (of type AND and OR) of comparison results (bitwise). The predicate registers represent the conditions to be fulfilled for the instructions to be executed.
[0003] In this document, flow control is by branching instructions based on the value of a predicate register bit. Recall that a branch is not to execute part of a sequential sequence of instructions by not executing a next instruction of the code but by going directly to a previous or later instruction in the code. The connection can be done forwards or backwards in the computer code.
[0004] To extract the execution options of the instructions, the comparisons are made by a ternary comparison unit (NALU) which compares two values with a comparison mask. However, this type of processor has several disadvantages.
[0005] For example, the number of cycles required to execute a code is high. This is due in particular to the massive use of branching as a means of control. This document announces a number of two cycles to make a connection. However, this is a study processor with access to memory in a single cycle and without error correction code (eg ECC type). Such a processor can not be used realistically in industrial applications. In industrial applications, a number of five cycles is usually required to perform a branch. In addition, the presented processor uses a conventional computing unit (ALU) and a ternary computing unit (TALU). It is therefore not possible to do parallel computation which does not optimize the size of instructions which is however 164 bits which normally allows the parallel execution of instructions. There is therefore a need to improve the processors of the prior art, in particular the processors dedicated to searching for correspondence of elements in lists. The present invention falls within this framework. A first aspect of the invention relates to a method of processing machine instructions by a processor comprising the following steps: receiving a machine instruction to be executed, said machine instruction comprising the identification of at least a first operation to execute and a conditional prefix representing a condition to be verified to execute said at least one first operation, - evaluating said prefix, and - executing or not executing said at least one first operation identified in said machine instruction, according to said condition is checked or not.
[0006] A method according to the first aspect provides optimized control of the execution of the instructions. It allows to reduce the use of connections. A method according to the first aspect allows for accelerated processing by reducing the number of cycles required to execute instructions. For example, a method according to the first aspect makes it possible to perform element matching searches in a reduced number of cycles in a computer system. According to embodiments, the evaluation of said prefix includes checking a value of a predicate register. Said conditional prefix may comprise: - an identification of said value of said predicate register, and - an identification of a second operation to be performed on said value for said verification. For example, said second operation is a logical operation. For example again, said second operation is an expectation of achieving said value. Said second operation can also be an empty operation, the condition being thus always verified. According to embodiments, said at least one first operation is a connection to another machine instruction of a code to be executed by said processor. For example, said branching is conditioned by evaluating a predicate vector having a plurality of values of one or more predicate registers. Said predicate vector can be evaluated in part. According to embodiments, said at least one first operation represents a predicate calculation, said calculation comprising: determining a bit of a calculation register, comparing said determined bit with a predetermined value, and writing a result of said comparison into a predicate register. For example, said machine instruction comprises the identification of two first operations to be executed in parallel.
[0007] For example still, the execution of the machine instructions by the processor is managed by a processing string module and wherein said machine instructions are executed at the execution stage of said processing chain. Each machine instruction may be executed in one processor cycle. Said machine instructions can be represented by code words whose format is predefined. A second aspect of the invention relates to a processor configured for carrying out a method according to the first aspect of the invention. For example, such a processor comprises: a memory configured to store at least one machine instruction to be executed, said machine instruction comprising the identification of at least a first operation to be executed and a conditional prefix representing a condition to be checked to execute said at least a first operation, - a management module configured to evaluate said prefix and to execute or not said at least one first operation identified in said machine instruction, according to whether said condition is verified or not, and 25 - a processing unit configured to execute said at least one identified first operation. A third aspect of the invention relates to an element matching search device in a list, comprising a processor according to the second aspect. For example, this device is an associative list processing unit (ALPU), an "Associative List Processing Unit".
[0008] Other characteristics and advantages of the invention will appear on reading the present detailed description which follows, by way of nonlimiting example, and the appended figures among which: FIG. 1A schematically illustrates a processor according to embodiments Fig. 1B illustrates the steps of processing a flow of instructions according to embodiments; Fig. 2 illustrates a branch; Figs. 3A-3B and 4A-4G illustrate instructions according to embodiments. In the following, a processor architecture according to embodiments is described. A very simplified illustration of a processor is given in Figure 1A. This illustration aims to present the elements discussed in the following description. The person skilled in the art is able to understand that other elements are useful for the complete operation of the processor. These elements are not presented for the sake of brevity of the description. A memory 100 stores the machine instructions to be executed by the processor.
[0009] The management of the execution of these instructions is done by a processing chain module 101 (or "pipeline" in English terminology). The execution of the operations necessary for the execution of the instructions is carried out by a processing unit 102 of ALU type (acronym of "arithmetic and logic unit" in English terminology).
[0010] The data useful for the execution of the instructions are stored in one or more registers 103 accessible for reading and / or writing by the processing unit. The processing string module manages the instruction flow according to seven "stages" (or steps) of execution illustrated in FIG. 1B: 1) sending 104 the address of the instruction to be executed to the memory instructions, 2) an empty clock cycle 105 to take into account the reading latency of the instruction in memory, 3) the verification 106 of the received instruction, for example by checking dedicated bits of a code of error correction (ECC), 4) decoding 107 of the instruction, 5) reading 108 of the operands of the instruction, 6) executing 109 operations, and 7) writing 110 the results of the instruction. instruction in one or more registers. Returning to FIG. 1A, to control the execution of the instructions by the processing string module, the processor comprises one or more predicate registers 111. Depending on the bit values in this register, certain instructions may or may not be executed. Moreover, the connection to the instructions of the code executed by the processor is generated by a dedicated connection unit 112.
[0011] In the prior art, branching is the technique commonly used to control the execution flow of instructions. However, the connection is very consumer of processor cycle. It is therefore proposed here to reduce its use. The branch is firstly described in the following with reference to FIG. 2. It is assumed that a set of instructions noted from A to Z of a code are to be executed sequentially. In other words, the code includes the instruction A, then the instruction B, then the instruction C and so on. It is also assumed that during the execution of the code, certain instructions must be executed conditionally. For example, if a condition COND is fulfilled, the instruction E which follows directly the instruction D is not executed, but it is the instruction O which must be it. So we "skip" the execution of instructions D to N. In other words, the code is "connected" to the instruction O.
[0012] The connection therefore involves not passing the instruction E to the stage 6 of execution (step 109 of Figure 1 B). However, it also involves emptying all instructions following the instruction E of the processing chain to enter the instruction O. In Figure 2, the content of each stage of the processing chain is represented for different cycles T1 at T16.
[0013] In the first cycle T1, the instruction A is requested by the processing chain module. At cycle T2, the response from the instruction memory module is awaited. Thus, the instruction A passes from the stage 1 to the stage 2, which releases the stage 1. At the cycle T2 the instruction B can thus be requested to the instruction memory module.
[0014] At cycle T3, instruction A is received and therefore goes to stage 3 for verification. Instruction B remains on stage 2, which frees stage 1 for instruction C. As cycles progress, instructions enter the processing chain module and pass successively the floors.
[0015] In cycle T7, the processing chain is fully filled. At cycle T8, instruction A exits the chain. In the T9 cycle the connection is requested, for example following the verification of a condition by the connection module 112 in the predicate register 111. Thus, for example, the instruction E must not be executed but the code must be continue from the instruction O. The instruction E is not passed to the stage 6, it is replaced by an empty instruction, generally designated "NOP" in the T10 cycle. In addition, the instructions I, H, G and F which were to pass on stages 2, 3, 4 and 5 are emptied. The instruction J which was to enter stage 1 in cycle T10 is replaced by the instruction O (instead of requesting the instruction J, the processing chain module requests the instruction O). From this cycle, the sequence of instructions in the stages of the processing string module continues normally. It can thus be seen that it is only in cycle T15 that the instruction O reaches the execution stage 6. Thus, the connection of the code required 5 cycles.
[0016] It is thus understood that the connection is consumer in processor cycles and execution time. For this reason, according to embodiments, it is desired to avoid them. It is therefore proposed a new structure for controlling the flow of execution of the instructions, less dependent on the connections. The embodiments however do not exclude the use of the connections. It is proposed to replace the connections to test instruction execution conditions with conditional prefixes in the processor instructions. Thus, a new instruction format is proposed, as shown schematically in Figure 3A. The instructions 300 according to the embodiments comprise two parts: a first part 301 called "prefix" comprises the condition to verify whether or not to execute the operation or operations contained in the part 302. For example, the instructions are coded on 64 bits and a byte (8bits) is used to encode the prefix. Several types of prefix can be considered. Thus, the prefix may comprise two sub-parts, as illustrated by FIG. 3B: a sub-part 302 identifying the prefix (for example coded on two bits from 0 to 3) and a sub-part 303 containing the condition to be checked. The prefixes are evaluated at the reading stage of the operands. Depending on the result obtained, the operation or operations contained in the part 302 are executed. Several types of prefixes can be envisaged.
[0017] For example, the code "0" can identify an empty condition. In this case, the operation (s) in part 302 are executed unconditionally. In assembly language, we can write the instruction directly. The absence of a condition preceding the instruction is then equivalent to an empty condition.
[0018] The code "1" can for example identify a waiting condition. In this case, the operation or operations contained in the part 302 are executed only once the condition contained in the part 303 filled. This prefix is equivalent to a branch on a current instruction address. This prefix can be identified by "wait_for" in an assembly language. For example again, the code "2" may identify the verification of the fulfillment of a condition for performing one or more operations contained in part 302 or not performing an operation. Thus, if condition 303 is true, the operation or operations are executed, otherwise the condition is false and no operation is performed (the instruction is replaced by a blank instruction, generally referred to as "NOP"). This prefix can be referred to as "do_if" in assembly language. The code "3" may identify the verification of the non-fulfillment of a condition for performing one or more operations contained in part 302 or not performing an operation. Thus, if the condition 303 is false, the instruction or instructions are executed, otherwise no operation is executed (they are replaced by a blank instruction, generally referred to as "NOP"). This prefix can be referred to as "do_if_not ()" in assembly language. The condition of the portion 303 may be represented by an address of one or more bits of a predicate register or a register storing the result of a comparison of a predicate register bit with a fixed value.
[0019] Figs. 4A-4G illustrate examples of instructions that may be contained in part 302 of the structure shown in Fig. 3A. Each instruction is identified by an instruction code in a field 400. Figure 4A illustrates an empty instruction "NOP". As it is an empty instruction, once identified by the field 400, the remaining part 401 does not contain any particular information. It is not used. Figure 4B illustrates a stop instruction "STOP". It indicates the end of a program. Once identified by the field 400, the remaining part 402 does not contain any particular information. It is not used.
[0020] Figure 4C illustrates an immediate operand operand instruction. It is an instruction for which one of the operands is not represented by the memory address which stores them but rather directly by the value itself, for example coded on 4 bytes of a field 403. instruction furthermore includes a field 404 representing the memory address of the second operand, a field 405 representing the memory address to store the result of the instruction and a field 406 containing the code of the operation to be executed by the unit. ALU treatment on both operands. The direct coding of an operand requiring a lot of space, the instruction of FIG. 4C does not allow the parallel execution of several operations. Figure 4D illustrates a single operation instruction. This is an instruction that will only command one operation by the ALU processing unit. Here the two operands are represented by the memory addresses which store them. The instruction thus comprises a field 407 representing the memory address of the second operand, a field 408 representing the memory address of the first operand, a field 409 representing the memory address to which the result of the instruction is stored and a field 410 containing the code of the operation to be performed by the processing unit ALU on the two operands. Field 411 is unused. Figure 4E illustrates a two-step instruction. This is an instruction that will command the execution of two operations by the ALU processing unit. Each operation is represented by a field 412 and 413. In the field 412 (or 413), a field 414 (or 418) represents the memory address of the second operand, a field 415 (resp. memory of the first operand, a field 416 (respectively 420) represents the memory address at which to store the result of the instruction and a field 417 (respectively 421) contains the code of the operation to be executed by the unit. ALU treatment on both operands. It is possible to perform two operations in parallel because the size of the instruction 300 allows it. Figure 4F illustrates an instruction with a branch and an operation. This is an instruction that will control the execution of a branch by the process chain module and an operation by the ALU processing unit. The branch is represented by a field 422 and the operation by a field 423.
[0021] In the field 423, a field 424 represents the memory address of the second operand, a field 425 represents the memory address of the first operand, a field 426 represents the memory address to store the result of the instruction and a field 427 contains the code of the operation to be performed by the ALU processing unit on both operands. In the field 422, a field 428 represents the memory address of the instruction to which the branch points (the "jump" is made at this address), a field 429 represents the predicate vector of the predicate register to be checked (according to Embodiments, it is desired to be able to check several predicates in a single operation, so we speak of predicate vector as described below), a field 430 contains the target values for the predicate vector (containing the values for which we consider the condition of the branch is satisfied) and a field 431 contains the code representing the branch.
[0022] It is possible to execute a branch and a parallel operation because the size of the instruction 300 allows it. Figure 4G illustrates a single branch instruction. This is an instruction that will only command a connection by the process chain module.
[0023] A field 432 represents the memory address of the instruction to which the branch points (the "jump" is made at this address), a field 433 represents the predicate vector of the predicate register to be checked (according to embodiments, one wishes to be able to verify several predicates in a single operation, thus one speaks of vector of predicates as described in the continuation), a field 434 includes the target values for the vector of predicate (containing the values for which one considers that the condition of the connection is filled) and a field 435 contains the code representing the branch. Field 436 is unused. Other types of instructions are possible. For example, it is possible to provide instructions comprising operations of compound type ("compound instructions" in English terminology) allowing for example to achieve combinations of elementary operations of the ALU processing unit. It is also possible to combine the operations composed with a conventional operation, with a branch or to run two operations composed parallel sen in the same instruction. The use of a conditional prefix as described above makes it possible to gain in processor cycles. The very simplified example below makes it possible to understand it. It is recalled that a branch performed (condition filled) consumes five cycles while a branch not performed (condition not fulfilled) only consumes one cycle. Take the following code, implementing a branch: CODE_All 1: op0 // This is the execution of an op0 operation. 2: branch_if_not (p0) LO // This is to connect the code to the line LO if the predicate p0 is false. 3 op1 // This is the execution of an op1 operation. This line is executed when p0 is true. 4: branch L1 // This is an unconditional connection to line L1 because the next line is LO which must not be executed if p0 is true. 5: LO: op2 // This is the execution of an op2 operation. 6: L1: op3 // This is the execution of an op3 operation.
[0024] The code above is therefore to execute op0, then according to whether p0 is false or true: - op2 then op3 (p0 false), or - op1 then op3 (p0 true). In terms of cycles, in the case p0 = false one consumes 8 cycles: - a cycle for op0, - five cycles for the connection made on line 2, - a cycle for op2, - a cycle for op3. In terms of cycles, in the case p0 = true one consumes so 9 cycles: - a cycle for op0, - a cycle for the connection not carried out in line 2, - a cycle for op1, - five cycles for the connection made in line 4, - a cycle for op3.
[0025] Now take the following code, to execute the same program, but with conditional statements with a prefix as discussed above: CODE_B // 1: op ° / 1 This is the execution of the op0 operation. 2: do_t (p0), op1 // This is to execute an operation op1 provided that p0 is true. In the example of FIG. 3A, the field 301 would include the address of the predicate p0 and the code of the operation IF of the processing unit ALU. Field 302 would include the representation of op1. The models of FIG. 4D or 4C could also be used if the processor can implement wide instructions with parallel operations. 3: do_if_not (p0), op2 // This is to execute op1 on the condition that p0 is false 4: op3 // This is the execution of op3. We can already notice that the writing of the code is simplified. This code has more than four lines against six previously. In addition, it does not use a connection.
[0026] In terms of cycles, in the case p0 = false one therefore consumes 4 cycles: - a cycle for op0, - a cycle for the conditional instruction of line 2, - a cycle for the conditional instruction of line 3, - a cycle for op3. In terms of cycles, in the case p0 = true one also consumes 4 cycles. With a processor according to the invention, therefore, the same program is executed much more quickly. According to embodiments, it is possible to accelerate the execution of the programs even more.
[0027] For this, we introduce a bitwise comparison operation for testing a bit of a register to write the result in a predicate register. Indeed, this type of comparison being much implemented to obtain the value of the condition to be tested (p0 in the example above), it is advantageous to put a dedicated operation at the disposal of the processor. Such an operation can be written as: cmp_bit_1_to_reg2 [28], p0. This operation compares the position bit 28 of register reg2 to the value "1" and writes the result ("1" for true, "0" for false) in the predicate p0. It therefore takes as operands the value of the position of the bit to be tested (28), the address of register reg2 and the address of predicate p0. In a typical processor, it would take two cycles to arrive at the same result: 1: and reg2, 0x10000000, reg () // This is to do a logical AND operation between the contents of register reg2 with the value 28 (in hexadecimal ) and store the result in register reg () 2: cmp_neq_to reg °, 0, p0 // This statement compares the contents of register reg () to 0 and stores the result in the predicate p0. So, if reg () is not 0, then p0 is true and if reg () is 0, p0 is false.
[0028] In the sample code CODE_A given above, two predicates are used, p0 and p1. It would take four cycles in this code to get their values. By using a dedicated operation (of type cmp_bit_1_to_reg2 [28], p0) this number can be reduced to 2. It is even possible to reduce this number to one cycle if the processor allows instructions with parallel operations. Thus, instead of writing the obtaining of the predicates p0 and pl as follows: 1: and reg2, 0x10000000, reg () / 1 We obtain the value of the bit at the position 28 of register reg2 and we store it in reg ° . 2: and reg2, 0x1000000000000000, set // The value of the bit is obtained at position 56 of register reg2 and stored in reg. 3: cmp_neq_to reg °, 0, p0 // This statement compares the contents of register reg () to 0 and stores the result in the predicate p0. So, if reg () is not 0, then p0 is true and if reg () is 0, p0 is false. 4: cmp_eq_to regl, 0, pl fi This instruction compares the contents of the register set to 0 and stores the result in the predicate p1. Thus, if regl is 0, pl is true and if reg () is not 0, pl is false. It is possible to write it in a single line 1: cmp_bit_1_to_reg2 [28], p0 II cmp_bit_0to_reg2 [56], the double vertical bar means that the operations are executed in parallel. Parallel execution of operations is possible here because the direct values of the position to be tested in the register is not manipulated (which is not the case in the prior art). It is recalled that the use of direct values prohibits parallel operations because they require a large number of bits (as explained with reference to FIG. 4C).
[0029] As already mentioned above, the use of prefixed conditional instructions does not prevent the use of connections. Embodiments, however, provide improvements to the known connections.
[0030] Still with the aim of gaining processor cycles, it is proposed to make the connections on predicate vectors in order to be able to test several conditions at the same time. Such a branch instruction can be written as: branch_if_veq (abcd), L0. This is to branch to the instruction shown on the LO line if the predicate vector {p3, p2, p1, p0} equals {a, b, c, d}. The parameters a, b, c and d can take the values 0 or 1 or x if it is not necessary to test on one of the predicates. For example, if the condition only applies to the predicates p0 and pl, we can write an instruction su type branch_if_veq (xxcd), L0.
[0031] In what follows, various features described above are combined in the same code to gain in processor cycles. We start by presenting a code as it would be written and executed according to the prior art, then we present an optimized code to save cycles while performing the same operations.
[0032] Consider the following code, written with instructions according to the prior art: CODE_C // 1: opl II op2 // execute operations opl and op2 in parallel 2: and reg2, 0x10000000, reg () // We obtain the value the bit at position 28 (in hexadecimal) reg2 register and stored in reg °. 3: and reg2, 0x1000000000000000, set // The value of the bit is obtained at position 56 of register reg2 and stored in reg. 4: cmp_neq_to reg °, 0, p0 // This statement compares the contents of register reg () to 0 and stores the result in the predicate p0. So, if reg () is not 0, then p0 is true and if reg () is 0, p0 is false. : cmp_eq_to regl, 0, pl 5 fi This statement compares the contents of the register set to 0 and stores the result in the predicate p1. Thus, if regl is 0, pl is true and if regl is different from 0, pl is false. 6: op3 II op4 // Perform operations op3 and op4 in parallel 7: branch_if_not (p0) LO // Connect the code to the line LO if the predicate p0 is false. 8: op5 // Perform operation op5. This line is executed when p0 is true. 9: branch Ll // This is an unconditional connection to the line L1 because the following line is LO which must not be executed if p0 is true 10: LO: op6 // Perform an operation op6 11: Ll: op7 fi We execute an operation op7 12: branch_if_not (pl) L2 // We connect the code to the line L2 if the predicate pl is false. 13: op8 // Perform operation op8. This line is executed when pl is true. 14: branch L3 // This is an unconditional connection to the line L3 because the next line is L2 which must not be executed if pl is true 15: L2: op9 // We execute an operation op9 16: L3: opl 0 // Perform an operation opl 0 17: or, p0, pl, p2 302 143 2 19 // We calculate the predicate p2 by a logical OR between p0 and pl 18: branch_if_not (p2) L4 // On connect the code to line L4 if the predicate p2 is false. 19: opl 1 5 fi Perform operation opl 1. This line is executed when p2 is true. 20: stop // End of the program, unless it is connected to L4 21: L4: op12 // Execute an operation opl 2 10 The code CODE_C therefore comprises the computation of three predicates p0, pl and p2 as well as three connections contingent. It also includes simple operation operations and parallel executions (marked by the sign II). Recalling that a conditional branch performed consumes five cycles and that a branch not performed (because its condition is not fulfilled) only consumes one cycle, it can be determined that for the case p0 = true and pl = false (and conversely), this code runs in 25 cycles (it is also considered that the stop only consumes one cycle). For p0 = pl = true or false, this code runs in 24 cycles. Here is the writing of this code by using prefixed conditional instructions and the comparison and connection instructions mentioned in the description above: CODE_D // 1: opl II op2 25 fi Perform opl and op2 operations in parallel 2 : cmp_bit_i_to_reg2 [28], p0 II cmp_bit_0to_reg2 [56], pl // Lines 2, 3, 4 and 5 of CODE_C are here condensed into one line and instruction (with two parallel operations). 3: op3 II op4 30 fi Operations op3 and op4 are executed in parallel. 4: do_t (p0), op5 // Execute op5 if p0 is true. 5: do_if_not (p0), op6 // We execute op6 if p0 is false. 6: op7 // Perform operation op7. 7: do_t (p1), op8 // Execute op8 if pl is true. 8: do_if_not (pl), op9 // We execute op9 if pl is false. 6: opl 0 // Perform operation op10. 7: branch_if_veq (xx00) LABO // We connect the code to the LABO line if the predicates pl and p0 are false. It should be noted that this way of proceeding avoids realizing the OR of line 17 of the code CODE_C. 9: opl 1 // Perform operation op11. This line is executed when pl or p0 is true. 10: stop // End of the program, except if it is connected to LABO 11: LAB: op12 // One carries out an operation api 2. One notes that the code is here more compact, it counts only 11 lines of instructions against 21 for the code C. For the worst case (pl = p0 = false), where the connection is made, the code CODE_D executes in 15 cycles which is much less than the 24 cycles of the best case for the code code_c. The present invention has been described and illustrated in the present detailed description with reference to the accompanying figures. However, the present invention is not limited to the embodiments presented. Other variations and embodiments may be deduced and implemented by those skilled in the art upon reading the present description and the accompanying figures.
[0033] In the claims, the term "include" does not exclude other elements or other steps. The indefinite article "one" does not exclude the plural. The various features presented and / or claimed can be advantageously combined. Their presence in the description or in different dependent claims does not exclude the possibility of combining them. The reference signs can not be understood as limiting the scope of the invention.

权利要求:
Claims (15)
[0001]
REVENDICATIONS1. A method of processing machine instructions by a processor comprising the following steps: - receiving (104, 105) a machine instruction (300) to be executed, said machine instruction including the identification of at least a first operation (302 ) to execute and a conditional prefix (301) representing a condition to be verified to perform said at least one first operation, - evaluating (108) said prefix, and - executing or not (109) said at least a first an operation identified in said machine instruction, depending on whether said condition is verified or not.
[0002]
The method of claim 1, wherein evaluating said prefix includes verifying a value of a predicate register.
[0003]
3. Method according to claim 2, said conditional prefix comprises: an identification of said value of said predicate register, and an identification of a second operation to be performed on said value for said verification.
[0004]
The method of claim 3, wherein said second operation is a logical operation.
[0005]
5. The method of claim 3, wherein said second operation is an expectation of achieving said value.
[0006]
The method of claim 3, wherein said second operation is an empty operation, whereby the condition is always verified.
[0007]
7. Method according to any one of the preceding claims, wherein, said at least one first operation is a branch to another machine instruction of a code to be executed by said processor.
[0008]
The method of claim 7, wherein said branching is conditioned by evaluating a predicate vector having a plurality of values of one or more predicate registers.
[0009]
The method of claim 8, wherein said predicate vector is evaluated in part.
[0010]
10. Method according to any one of the preceding claims, wherein said at least one first operation represents a predicate calculation, said calculation comprising: the determination of a bit of a calculation register; the comparison of said determined bit; with a predetermined value, and - writing a result of said comparison in a predicate register.
[0011]
The method of any one of the preceding claims, wherein said machine instruction comprises identifying two first operations to be executed in parallel.
[0012]
A method according to any one of the preceding claims, wherein execution of the machine instructions by the processor is managed by a processing string module and wherein said machine instructions are executed at the execution stage of said processing string. .
[0013]
The method of claim 12, wherein each machine instruction is executed in one processor cycle.
[0014]
The method of any of the preceding claims, wherein said machine instructions are represented by code words whose format is predefined.
[0015]
15. Processor comprising: - a memory (100) configured to store at least one machine instruction to be executed, said machine instruction comprising the identification of at least a first operation to be executed and a conditional prefix representing a condition to be verified for executing said at least a first operation, - a management module (101) configured to evaluate said prefix and to execute or not said at least one first operation identified in said machine instruction, according to whether said condition is verified or not, and - a unit of processing (102) configured to execute said at least one identified first operation.

类似技术:

公开号 | 公开日 | 专利标题

EP2947563B1|2020-10-21|Conditional instruction processor

JP6856749B2|2021-04-14|Systems and methods for implementing native contracts on the blockchain

US8909906B2|2014-12-09|Packet processor configured for processing features directed by branch instruction with logical operator and two feature selector fields

FR2686717A1|1993-07-30|MICROPROCESSOR HAVING A UNIT FOR EXECUTING PARALLEL INSTRUCTIONS.

US11106437B2|2021-08-31|Lookup table optimization for programming languages that target synchronous digital circuits

US10747533B2|2020-08-18|Selecting processing based on expected value of selected character

US20130326480A1|2013-12-05|Version labeling in a version control system

US10613862B2|2020-04-07|String sequence operations with arbitrary terminators

WO2018024093A1|2018-02-08|Operation unit, method and device capable of supporting operation data of different bit widths

EP2860656B1|2016-04-27|Method for execution by a microprocessor of a polymorphic binary code of a predetermined function

US8959501B2|2015-02-17|Type and length abstraction for data types

CA2348069A1|2001-11-23|Multi-resource architecture management system and method

Iyer et al.2018|Building Games with Ethereum Smart Contracts

US8843730B2|2014-09-23|Executing instruction packet with multiple instructions with same destination by performing logical operation on results of instructions and storing the result to the destination

US20210117375A1|2021-04-22|Vector Processor with Vector First and Multiple Lane Configuration

US11204768B2|2021-12-21|Instruction length based parallel instruction demarcator

US11144238B1|2021-10-12|Background processing during remote memory access

KR20210156854A|2021-12-27|vector index register

EP1596282A2|2005-11-16|Apparatus and method of instruction set control in a microprocessor

US20200371886A1|2020-11-26|Multi-lane solutions for addressing vector elements using vector index registers

US20220019416A1|2022-01-20|Removing branching paths from a computer program

FR3008504A1|2015-01-16|METHOD FOR PROVIDING AN INSTRUCTION CODE AND CIRCUIT

WO2014195141A1|2014-12-11|Material accelerator for handling red and black trees

US11036503B2|2021-06-15|Predicate indicator generation for vector processing operations

US20200278868A1|2020-09-03|Method to execute successive dependent instructions from an instruction stream in a processor

同族专利:

公开号 | 公开日

EP2947563B1|2020-10-21|

JP2016006632A|2016-01-14|

EP2947563A1|2015-11-25|

US20150339122A1|2015-11-26|

US10338926B2|2019-07-02|

FR3021432B1|2017-11-10|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

GB2315890A|1996-07-30|1998-02-11|Mitsubishi Electric Corp|Microprocessor having conditional-execution instructions|

GB2352536A|1999-07-21|2001-01-31|Element 14 Ltd|Conditional instruction execution|

EP1267257A2|2001-06-11|2002-12-18|Broadcom Corporation|Conditional execution per data path slice|

US5659722A|1994-04-28|1997-08-19|International Business Machines Corporation|Multiple condition code branching system in a multi-processor environment|

US6009512A|1997-10-27|1999-12-28|Advanced Micro Devices, Inc.|Mechanism for forwarding operands based on predicated instructions|

US6260082B1|1998-12-23|2001-07-10|Bops, Inc.|Methods and apparatus for providing data transfer control|

JP2001306321A|2000-04-19|2001-11-02|Matsushita Electric Ind Co Ltd|Processor|

JP3564445B2|2001-09-20|2004-09-08|松下電器産業株式会社|Processor, compiling device and compiling method|

JP3627725B2|2002-06-24|2005-03-09|セイコーエプソン株式会社|Information processing apparatus and electronic apparatus|

US6865662B2|2002-08-08|2005-03-08|Faraday Technology Corp.|Controlling VLIW instruction operations supply to functional units using switches based on condition head field|

GB2402510A|2003-06-05|2004-12-08|Advanced Risc Mach Ltd|Predication instruction within a data processing system|

US20080016320A1|2006-06-27|2008-01-17|Amitabh Menon|Vector Predicates for Sub-Word Parallel Operations|

US20100011524A1|2008-07-21|2010-01-21|Gerald Oliver Roeback|Portable multi-function movable, electronic device display screen and glass cleaning accessory|

US9342304B2|2008-08-15|2016-05-17|Apple Inc.|Processing vectors using wrapping increment and decrement instructions in the macroscalar architecture|

US8181001B2|2008-09-24|2012-05-15|Apple Inc.|Conditional data-dependency resolution in vector processors|

US8082426B2|2008-11-06|2011-12-20|Via Technologies, Inc.|Support of a plurality of graphic processing units|

US9436651B2|2010-12-09|2016-09-06|Intel Corporation|Method and apparatus for managing application state in a network interface controller in a high performance computing system|

US9870305B2|2015-09-30|2018-01-16|International Business Machines Corporation|Debugging of prefixed code|WO2014163040A1|2013-04-01|2014-10-09|ＨｏｙａＣａｎｄｅｏＯｐｔｒｏｎｉｃｓ株式会社|Near-infrared absorbing glass and method for manufacturing same|

US20170192788A1|2016-01-05|2017-07-06|Intel Corporation|Binary translation support using processor instruction prefixes|

US10628157B2|2017-04-21|2020-04-21|Arm Limited|Early predicate look-up|

JP2020126303A|2019-02-01|2020-08-20|富士通株式会社|Information processing apparatus, information processing program, and information processing method|

法律状态:
2015-04-22| PLFP| Fee payment|Year of fee payment: 2 |

2015-11-27| PLSC| Search report ready|Effective date: 20151127 |

2016-04-22| PLFP| Fee payment|Year of fee payment: 3 |

2017-04-21| PLFP| Fee payment|Year of fee payment: 4 |

2018-04-23| PLFP| Fee payment|Year of fee payment: 5 |

2019-05-28| PLFP| Fee payment|Year of fee payment: 6 |

2020-05-28| PLFP| Fee payment|Year of fee payment: 7 |

2021-05-26| PLFP| Fee payment|Year of fee payment: 8 |

优先权:

申请号 | 申请日 | 专利标题

FR1454511A|FR3021432B1|2014-05-20|2014-05-20|PROCESSOR WITH CONDITIONAL INSTRUCTIONS|FR1454511A| FR3021432B1|2014-05-20|2014-05-20|PROCESSOR WITH CONDITIONAL INSTRUCTIONS|

JP2015084940A| JP2016006632A|2014-05-20|2015-04-17|Processor with conditional instructions|

US14/716,245| US10338926B2|2014-05-20|2015-05-19|Processor with conditional instructions|

EP15168292.9A| EP2947563B1|2014-05-20|2015-05-20|Conditional instruction processor|

[返回顶部]